Loan Data From Proper Exploration by NongR

This report explores a dataset containing prices and attributes for approximately 114,000 records and 81 variables in this data frame.

## [1] 113937     81

Before start analyzing this data, let’s update the order of the factor levels of the variables ‘CreditGrade’ and create a new variable called ‘ListingCategory’ to store a name of the Listing category

# Create a new variable called ListingCategory.
# We will later update this variable to store a name of the Listing Category
pl$ListingCategory <- pl$ListingCategory..numeric.

# Update the Term and ListingCategory columns to be a factor  
pl <- mutate_at(pl, vars(Term, ListingCategory), as.factor)


## --- Update levels
# Update levels of CreditGrade to sorted from Highest to Lowest first
levelsCreditGrade <- c("AA", "A", "B", "C", "D", "E", "HR", "NC", "")

levels(pl$CreditGrade) <- levelsCreditGrade

# Update levels of IncomeRange
levelsIncomeRange <- c("Not displayed", "Not employed", 
                       "$0", "$1-24,999", "$25,000-49,999", 
                       "$50,000-74,999", "$75,000-99,999", "$100,000+")

levels(pl$IncomeRange) <- levelsIncomeRange

# Update levels of ListingCategory

levelsListingCat <- c(
  "Not Available",         #0
  "Debt Consolidation",    #1
  "Home Improvement",      #2
  "Business",              #3
  "Personal Loan",         #4
  "Student Use",           #5
  "Auto",                  #6
  "Other",                 #7
  "Baby&Adoption",         #8
  "Boat",                  #9
  "Cosmetic Procedure",    #10
  "Engagement Ring",       #11
  "Green Loans",           #12
  "Household Expenses",    #13
  "Large Purchases",       #14
  "Medical/Dental",        #15
  "Motorcycle",            #16
  "RV",                    #17
  "Taxes",                 #18
  "Vacation",              #19
  "Wedding Loans")         #20

levels(pl$ListingCategory) <- levelsListingCat

str(pl)
## 'data.frame':    113937 obs. of  82 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 9 levels "AA","A","B","C",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                               : Factor w/ 3 levels "12","36","60": 2 2 2 2 2 3 2 2 2 2 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "Not displayed",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...
##  $ ListingCategory                    : Factor w/ 21 levels "Not Available",..: 1 3 1 17 3 2 2 3 8 8 ...

Univariate Plots Section

CreditGrade

## 
##    AA     A     B     C     D     E    HR    NC       
## 84984  3315  3509  4389  5649  5153  3289  3508   141

ListingCreatingDate

Let’s take a closely look at a count of listings of each CreditGrade.

It is very interesting that:

From the discovery above, I assume that we only need to focus on the CreditGrade ‘AA’ from 2010 for this analysis since a number of count before 2010 is closed to 0.

See this link for more details about the propser loan

https://www.lendingmemo.com/prosper-loan-complaints/

New data set

From this point on, we will analyze data where the ListingCreationDate is greater than or equal to 2010 and there are almost 83,000 records in the new data set

# Get a subset of data which has the ListingCreationData >= 2010
new_pl <- subset(pl, year(ListingCreationDate) >= 2010)
dim(new_pl)
## [1] 82675    82

BorrowerRate

In average, the BorrowRate is around 0.15 except a very high number of applications with the Borrow rate around 3.25. Why is that the case?

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0400  0.1364  0.1875  0.1961  0.2573  0.3600

Occupation

Let’s zoom in data by occupation. The charts below show that most applications have the Occupations set to ‘Other’ and the LoanOriginalAmount of this category is varied a lot comparing to other occupations.

EmploymentStatus

Most applications have the EmploymentStatus ‘Employed’ and the median LoanOriginalAmount of this category is higher than others.

BorrowerState

Now, let’s take a look at a number of Listings by each state. Looks like major cities have a higher number of applications.

From the summary in terms of a number of applications in each state, ‘Wyoming’ (WY) has the lowest number of applications which is 119 and ‘California’ (CA) has the highest number of applications which is 10,479.

The median number of applications is 1161.5 and average number is 1722.4.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   119.0   427.2  1161.5  1722.4  2229.5 10479.0

The boxplot below shows that there are four states where a number of applications far exceeds the majority (outliers).

When combining information from the bar chart above and the boxplot below we can identify that those four states are California (CA), New York (NY), Texas (TX), and Florida (FL). Also notice that a number of applications in CA is also far more than the other three states.

The maps below make it clear that a number of listings is higher in the major cities.

Term

The histogram below shows that most applications have the 36-Term.

IncomeRange

Majority of the applications have the IncomeRange between $1-$50,000. There are also applications with IncomeRange = 0. Did they get a loan?

DebtToIncomeRatio

From the histogram and boxplot, most of the applications have the DebtToIncomeRatio less than 1.

However, there are also outliers where the DebtToIncomeRatio > 1 and up to DebtToIncomeRatio = 10.01. What is a status of those applications?

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1500  0.2200  0.2596  0.3200 10.0100

The frequency polygon below shows that most applications have the DebtToIncomeRatio around 0.2.

ListingCategory

Most applications apply the Proper loan for ‘Debt Consolidation’ with the median LoanOriginalAmout around $10,000

CreditScoreRange

The median of the CreditScoreRangeLower is 700 and the median of the CreditScoreRangeUpper is 719.

## [1] "*** Summary of CreditScoreRangeLower ***"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   600.0   660.0   700.0   699.2   720.0   880.0
## [1] "*** Summary of CreditScoreRangeUpper ***"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   619.0   679.0   719.0   718.2   739.0   899.0

Univariate Analysis

What is the structure of your dataset?

There are approximately 114,000 records and 81 variables. After analyzing the ListingCreatingDate variable, We found that data with various CreditGrade were not created after 2009, and only data with the CreditGrade ‘AA’ was created after 2010. Hence, I only filtered data with the ListingCreatingDate is greater than or equal to 2010 for this exploration.

With the new dataset, there are approximately 83,000 records with 82 variables where an additional factor variable ‘ListingCategory’ was created to store a ListingCategory name of each record

What is/are the main feature(s) of interest in your dataset?

I am paricularly interested in finding variables that are best for describing relationship with the BorrowerRate variable.

What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?

I think the following variables will correlate to the BorrowerRate; Occupation, EmploymentStatus, BorrowerState, Term, IncomeRange, DebtToIncomeRatio, ListingCategory, CreditScoreRangeLower, and CreditScoreRangeUpper.

Did you create any new variables from existing variables in the dataset?

As mentioned in the previous section, I created a new variable ‘ListingCategory’ to store a text of each category to accomodate this analysis.

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

I found an unsual trend in the charts that show a number of listings based on the ListingCreatingDate and found that there are no records with various CreditGrade after 2010, and only records with the CreditGrade ‘AA’ have the ListingCreatingDate greater than or equal to 2010 so I decided to only analyze records where the ListingCreatingDate >= 2010.

Also, I updated levels of the variables ‘CreditGrade’ and ‘IncomeRange’ to ease a process of creating visualizations for those variables so that an appropriate level’s order is shown in the plots.

Bivariate Plots Section

Scatterplot Matrix

Let’s start off by creating a scatterplot matric with the variables below:

BorrowerRate, LoanOriginalAmount, StatedMonthlyIncome, DebtToIncomeRatio, CreditScoreRangeLower, CreditScoreRangeUpper, Term, EmploymentStatus

Note that we did not choose the ‘Occupation’, ‘BorrowerState’ and ‘ListingCategory’ in this plot because they have a lot of levels

The correlation matrix shows that the ‘LoanOriginalAmount’, ‘CreditScoreRangeLower’, and ‘CreditScoreRangeUpper’ variables have a strong correlation with the ‘BorrowerRate’ variable

BorrowerRate vs LoanOriginalAmount

Let’s take a look at a scatter plot of BorrowerRate (y) vs LoanOriginalAmount(x)

The correlation coefficient between BorrowerRate and LoanOriginalAmount is around -0.42 which shows a moderate correlation between those two variables

## 
##  Pearson's product-moment correlation
## 
## data:  LoanOriginalAmount and BorrowerRate
## t = -134.57, df = 82673, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4294568 -0.4182732
## sample estimates:
##        cor 
## -0.4238811

BorrowerRate vs CreditScoreRange

Let’s take a look at a scatter plot of BorrowerRate (y) vs CreditScoreRangeLower(x) and a scatter plot of BorrowerRate (y) vs CreditScoreRangeUpper(x)

The smoothed conditional mean line shows that the BorrowerRate is decreased when the CreditScoreRangeLower or CreditScoreRangeUpper is increased

The correlation coefficient between BorrowerRate and CreditScoreRangeLower is around -0.5 which shows a moderate correlation between those two variables

## 
##  Pearson's product-moment correlation
## 
## data:  CreditScoreRangeLower and BorrowerRate
## t = -166.77, df = 82673, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5068121 -0.4966108
## sample estimates:
##        cor 
## -0.5017289

The correlation coefficient between BorrowerRate CreditScoreRangeUpper is around -0.5 which shows a moderate correlation between those two variables

## 
##  Pearson's product-moment correlation
## 
## data:  CreditScoreRangeUpper and BorrowerRate
## t = -166.77, df = 82673, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5068121 -0.4966108
## sample estimates:
##        cor 
## -0.5017289

BorrowerRate vs BorrowerState

The boxplot below confirms that the BorrowerState does not have impact on the BorrowerRate since the median value of the BorrowerRate of all states is around 0.2 and the boxplots are closely aligned.

BorrowerRate vs Occupation

The boxplot below also shows that Occuputation does not correlate to the BorrowerRate since the median BorrowerRate is around 0.2 excepts the Occupations with a small number of applications that shows lower BorrowerRate, (for example, Judge)

BorrowerRate vs ListingCategory

The boxplot below also shows that ListingCategory does not correlate to the BorrowerRate since the median BorrowerRate is around 0.2.

BorrowerRate vs StatedMonthlyIncome

Let’s take a look at a scatter plot of BorrowerRate (y) vs StatedMonthlyIncome(x).

This scatter plot shows multiple outliers that is difficult to see a relation between the BorrowerRate and StatedMonthlyIncome.

So what is a summary of StatedMonthlyIncome?

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3467    5000    5953    7100 1750003

Let’s create a scatter plot of BorrowerRate (y) vs StatedMonthlyIncome(x) where StatedMonthlyIncome is in the 95% percentile.

It looks like the BorrowerRate is slightly decreased when the StatedMonthlyIncome is increased.

The correlation coefficient between BorrowerRate and StatedMonthlyIncome is around -0.25 which shows a weak correlation between those two variables

## 
##  Pearson's product-moment correlation
## 
## data:  StatedMonthlyIncome and BorrowerRate
## t = -71.343, df = 78796, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2528729 -0.2397558
## sample estimates:
##        cor 
## -0.2463256

BorrowerRate vs DebtToIncomeRatio Let’s take a look at a scatter plot of BorrowerRate (y) vs DebtToIncomeRatio(x).

This scatter plot shows multiple outliers that is difficult to see a relation between the BorrowerRate and DebtToIncomeRatio

What is a summary of DebtToIncomeRatio?

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    0.15    0.22    0.26    0.32   10.01    7084

Let’s create a scatter plot of BorrowerRate (y) vs DebtToIncomeRatio(x) where DebtToIncomeRatio is in the 99% percentile.

It looks like the BorrowerRate is increased when the DebtToIncomeRatio is increased.

The correlation coefficient between BorrowerRate and DebtToIncomeRatio is around 0.18 which shows a weak correlation between those two variables

## 
##  Pearson's product-moment correlation
## 
## data:  DebtToIncomeRatio and BorrowerRate
## t = 49.063, df = 74848, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1695706 0.1834521
## sample estimates:
##       cor 
## 0.1765201

LoanOriginalAmount vs StatedMonthlyIncome

Let’s take a look at the relationship between LoanOriginalAmount and StatedMonthlyIncome.

The scatter plot of LoanOriginalAmount(y) and 95 percentile of StatedMonthlyIncome (x) shows that the LoanOriginalAmount is increased when the StatedMonthlyIndome is increased.

The correlation coefficient between LoanOriginalAmount and 95 percentile of StatedMonthlyIncome is 0.41 which means that these two variables have a moderate correlation.

## 
##  Pearson's product-moment correlation
## 
## data:  StatedMonthlyIncome and LoanOriginalAmount
## t = 127.01, df = 78796, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4064179 0.4180093
## sample estimates:
##       cor 
## 0.4122303

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

There are a few features, which are LoanOriginalAmount, CreditScoreRangeLower, CreditScoreRangeLower, that have a moderate correlation with the BorrowerRate.

Additionally, there are a few, which are StatedMonthlyIncome and DebtToIncomeRatio, that shows a weak correlation with the BorrowerRate after removing the outlier.

However, other features, such as BorrowerState, Occupation, ListingCategory, do not seem to have any strong relationship with the BorrowerRate

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

The StatedMonthlyIncome and LoanOriginalAmount variables have a moderate correlation where the LoanOriginalAmount is increased when the StatedMonthlyIncome is increased

What was the strongest relationship you found?

The CreditScoreRangeLower/Update have the strongest relationship with the BorrowerRate where their correlation coefficient is around -0.5

Multivariate Plots Section

BorrowerRate, LoanOriginalAmount | Term

Let’s take a look at a scatter plot of BorrowerRate (y) vs LoanOriginalAmount(x) and use a different color for each Term.

It looks like most smaller LoanOriginalAmout has 36 Term but it is hard to tell because of overplotting. Let’s use a facet_wrap instead.

The plots below show that most of the applications have lower LoanOriginalAmount for all Terms. Also, it looks like the LoanOriginalAmount is usually about x$5000 (darker points) as we see in the plots

The conditional mean chart for different Term shows that the BorrowerRate of the 12-Term is the lowest regardless of the LoanOriginalAmount, follow by the 36-Term and 60-Term.

BorrowerRate, LoanOriginalAmount | Occupation, BorrowerState, and ListingCategory

The facet plots below show that Occupation, BorrowerState, and ListingCategory does not have a strong relationship with the BorrowRate and LoanOriginal amount since all conditional mean plots show a similar Trend.

Additionally, most applications with the LoanOriginalAmout less than $10,000 have the 36-Term with various BorrowerState regardless of Occupation, BorrowerState, and ListingCategory.

BorrowerRate, LoanOriginalAmount | EmploymentStatus

The chart below shows that applications with the EmploymentStatus ‘Not employed’ usually gets a higher BorrowerRate regardless of the LoanOriginalAmount where the BorrowRate for other EmploymentStatus is in the same range

BorrowerRate, LoanOriginalAmount | IncomeRange

The chart below shows that applications with the IncomeRange ‘Not displayed’ gets a higher BorrowerRate in general.

However, what surprise me is that the applications with the highest IncomeRange ($100,000) get a much higher BorrowerRate that other IncomeRange and applications with the lowest IncomeRange ($0) get the lowest BorrowerRate

BorrowerRate, LoanOriginalAmount | IncomeRange

As expected, applications with a lower CreditScoreRange (Lower and Upper) get the Lower BorrowerRate.

Interestingly, applications with a very high CreditScoreRange (Lower and Upper) get the BorrowerRate around 0.1 which is much lower than other CreditScoreRanges.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

There are a few features that provide better insights on the BorrowerRate when observing them with LoanOriginalAmount.

Those features are * Term: The BorrowerRate is less with a smaller term and the BorrowRate will keep decreasing as the LoanOriginalAmount is increased * EmploymentStatus: The BorrowerRate is the highest when the EmploymentStatus is ‘Not employed’ * CreditScoreRange: The BorrowerRate is higher when the CreditScore is less

Were there any interesting or surprising interactions between features?

I am surprised to see that applications with the highest IncomeRange get a much higher BorrowerRate than applications with a very small IncomeRange (e.g. $0).

I originally thought that the IncomeRange should not have any relationship with the BorrowerRate but it does not seem to be the case.

OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.

I did not create any models with this dataset.

Final Plots and Summary

Plot One

Description One

I chose this plot for the Plot One section because it shows that majority of listings in this dataset has the CreditGrade ‘AA’ and they were created after 2009 as opposed to other CreditGrades which have a much smaller number of listings comparing to the CreditGrade ‘AA’ and there are no listings with those CreditGrades after 2009.

This finding help me make a decision to only analyze data of the listings that were created after 2009 in this analysis.

Plot Two

Description Two

This chart clearly shows a strong relationship between the BorrowerRate and CreditScoreRange (Lower/Upper) where the conditional mean of the BorrowerRate is decreased as the credit score range increased.

Plot Three

Description Three

This chart shows that the borrower’s interest rate is much lower for the 12-Term in average, followed by 36 and 60 Term, and as the original amound of the loan increased, the Borrower’s interest rate is decreased.


Reflection

This dataset contains a large number of variables and it was challenging to come up with a question and choose main and other features to analyze because there are so many of them.

Additionally, since I do not have knowledge about Prosper Loan before working on this project, I had a difficulty of the terms and technical details explained in the variable definition and this also impacts on the previous problem where I spent a lot of time coming up with a question and choose variables that I think it might with the analysis and, as I went through the exercise, many variables that I chose did not seem to have a relationship with the main feature.

After spending time understanding the dataset and getting a basic understanding of the ProsperLoan, I was able to identify states which does not allow Prosper Loan by inspecting the geom_map and found the backup information after this analysis.

In the Univariate analysis section, I also discovered that there are a handful of listing data for various CreditGrade that was stopped creating after 2009 and only the listing data with the CreditGrade ‘AA’ was created after 2009 and this significantly help me make a decision to only analyze the listing data created after 2009 which is the majority of the population.

Most of the variables that I chose in the Bivariate analysis does not have a strong relationship with the main feature (BorrowerRate) which I think that the biggest problem for this is because of a lack of domain knowledge as I mentioned earlier.

I found the Multivariate Analysis is quite interesting since it help giving insights for the interactions between BorrowerRate, LoanOriginalAmount and other variables. Especially, I was surprised to see that the listing with the highest IncomeRange get a much higer BorrowerRate on average comparing to the listing with a very low income (e.g. $0)

This dataset has rich features that can be used in the future analysis. I have tidied a few variables to help with this analysis, for example, re-arrange factor levels of the IncomeRange and CreditGrade variables and create a new variable to store a ListingCategory name (the original dataset only contains the Category index). There are many more variables in this dataset need to be tidied to accomodate other Exploratory Data Analysis in the future.